Per-bucket concurrent rehashing algorithms

نویسنده

  • Anton Malakhov
چکیده

This paper describes a generic algorithm for concurrent resizing and on-demand per-bucket rehashing for an extensible hash table. In contrast to known lock-based hash table algorithms, the proposed algorithm separates the resizing and rehashing stages so that they neither invalidate existing buckets nor block any concurrent operations. Instead, the rehashing work is deferred and split across subsequent operations with the table. The rehashing operation uses bucket-level synchronization only and therefore allows a race condition between lookup and moving operations running in different threads. Instead of using explicit synchronization, the algorithm detects the race condition and restarts the lookup operation. In comparison with other lock-based algorithms, the proposed algorithm reduces high-level synchronization on the hot path, improving performance, concurrency, and scalability of the table. The response time of the operations is also more predictable. The algorithm is compatible with cache friendly data layouts for buckets and does not depend on any memory reclamation techniques thus potentially achieving additional performance gain with corresponding implementations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Split-Ordered Lists: Lock-Free Extensible Hash Tables (a summary)

The hash table is a common data structure for applications requiring constant time insert, delete and find of certain items. The basic idea is to use an array of buckets, where each bucket stores a list of items. A hash function is used to determine which bucket a certain item belongs to. Finding or deleting a specific item amounts to a linear search in a bucket list pointed to by the hash func...

متن کامل

Robust and Efficient Locality Sensitive Hashing for Nearest Neighbor Search in Large Data Sets

Locality sensitive hashing (LSH) has been used extensively as a basis for many data retrieval applications. However, previous approaches, such as random projection and multi-probe hashing, may exhibit high query complexity of up to Θ(n) when the underlying data distribution is highly skewed. This is due to the imbalance in the number of data stored per each bucket, which leads to slow query tim...

متن کامل

Hashing and Rehashing in Emulated Shared Memory Hashing and Rehashing in Emulated Shared Memory

The PRAM model is widely used to formulate parallel algorithms because of its shared memory and its synchronous behaviour. The model however bears little resemblance to real parallel machines. This has led to various approaches to emulating PRAMs on processor networks. We brieey survey the principles behind these emulations and show why hashing is an important part of them. We discuss the commo...

متن کامل

: Parallel Algorithms for Bucket Sorting and the Data Dependent Prefix Problem

The data dependent prefix problem is to compute all the n initial products x1⃝x2⃝...⃝xk, 1 ≤ k ≤ n, where the order is specified by a linked list. A parallel algorithm for the data dependent prefix problem is presented. This algorithm has time complexity O( n p + log n log n p ) using p processors on the exclusive-read exclusive-write computation model. A bucket sorting algorithm is also develo...

متن کامل

Comparison of Bucket Sort and RADIX Sort

Bucket sort and RADIX sort are two well-known integer sorting algorithms. This paper measures empirically what is the time usage and memory consumption for different kinds of input sequences. The algorithms are compared both from a theoretical standpoint but also on how well they do in six different use cases using randomized sequences of numbers. The measurements provide data on how good they ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1509.02235  شماره 

صفحات  -

تاریخ انتشار 2011